神经网络(NNS)对研究和行业进行了很大的影响。然而,随着NNS的准确性增加,它之后的大小是扩展,所需的计算操作数量和能量消耗。资源消费的增加导致NNS减少的采用率和现实世界部署不切实际。因此,需要压缩NNS以使它们可用于更广泛的受众,同时降低其运行时成本。在这项工作中,我们从因果推理的角度来处理这一挑战,我们提出了一个评分机制,以促进NNS的结构灌注。该方法基于在最大熵扰动下测量互信息,顺序地通过NN传播。我们展示了两种数据集和各种NNS尺寸的方法的表现,我们表明我们的方法在挑战条件下实现了竞争性能。
translated by 谷歌翻译
深度神经网络(DNN)的算法 - 硬件共同设计的最新进展已经证明了它们在自动设计神经架构和硬件设计方面的潜力。然而,由于昂贵的培训成本和耗时的硬件实现,这仍然是一个充满挑战的优化问题,这使得对神经结构和硬件设计难以解答的巨大设计空间探索。在本文中,我们证明我们所提出的方法能够在帕累托前沿定位设计。这种功能由新颖的三相协同设计框架启用,具有以下新功能:(a)从硬件架构和神经结构的设计空间探索的DNN培训解耦,(b)提供硬件友好的神经结构空间通过考虑构造搜索单元的硬件特征,(c)采用高斯过程来预测准确性,延迟和功耗以避免耗时的合成和路由过程。与手动设计的Resnet101,Inceptionv2和MobileNetv2相比,我们可以在想象网数据集中获得高达3倍的准确度,高达5%的准确性。与其他最先进的共同设计框架相比,我们发现的网络和硬件配置可以达到更高的2%〜6%,精度为2倍〜26倍,延迟较高8.5倍。
translated by 谷歌翻译
神经网络在广泛的任务中展示了他们出色的表现。具体地,基于长短短期存储器(LSTM)单元格的复发架构表现出了在真实数据中模拟时间依赖性的优异能力。然而,标准的经常性架构无法估计其不确定性,这对于安全关键型应用如医学,这是必不可少的。相比之下,贝叶斯经常性神经网络(RNN)能够以提高的精度提供不确定性估计。尽管如此,贝叶斯的RNN是在计算上和记忆所要求的,尽管他们的优势尽管他们的实用性限制了他们的实用性。为了解决这个问题,我们提出了一种基于FPGA的硬件设计,以加速基于贝叶斯LSTM的RNN。为了进一步提高整体算法 - 硬件性能,提出了一种共同设计框架来探索贝叶斯RNN的最适合的算法 - 硬件配置。我们对医疗保健应用进行了广泛的实验,以证明我们的设计和框架的有效性的提高。与GPU实施相比,我们的FPGA的设计可以实现高达10倍的加速,能效率较高的近106倍。据我们所知,这是第一份针对FPGA上的贝叶斯RNN的加速的工作。
translated by 谷歌翻译
神经网络(NNS)已经在广泛的应用中证明了它们的潜力,例如图像识别,决策或推荐系统。然而,标准NNS无法捕获其模型不确定性,这对于包括医疗保健和自治车辆的许多安全关键应用至关重要。相比之下,贝叶斯神经网络(BNNS)能够通过数学接地表达他们预测中的不确定性。尽管如此,BNN尚未广泛用于工业实践,主要是由于其昂贵的计算成本和有限的硬件性能。这项工作提出了一种新的基于FPGA的硬件架构,可以通过Monte Carlo辍学加速BNN推断。与其他最先进的BNN加速器相比,所提出的加速器可以达到高达4倍的能量效率和9倍的计算效率。考虑到部分贝叶斯推断,提出了一种自动框架,探讨了硬件和算法性能之间的权衡。进行广泛的实验以证明我们所提出的框架可以有效地找到设计空间中的最佳点。
translated by 谷歌翻译
The performance of the Deep Learning (DL) models depends on the quality of labels. In some areas, the involvement of human annotators may lead to noise in the data. When these corrupted labels are blindly regarded as the ground truth (GT), DL models suffer from performance deficiency. This paper presents a method that aims to learn a confident model in the presence of noisy labels. This is done in conjunction with estimating the uncertainty of multiple annotators. We robustly estimate the predictions given only the noisy labels by adding entropy or information-based regularizer to the classifier network. We conduct our experiments on a noisy version of MNIST, CIFAR-10, and FMNIST datasets. Our empirical results demonstrate the robustness of our method as it outperforms or performs comparably to other state-of-the-art (SOTA) methods. In addition, we evaluated the proposed method on the curated dataset, where the noise type and level of various annotators depend on the input image style. We show that our approach performs well and is adept at learning annotators' confusion. Moreover, we demonstrate how our model is more confident in predicting GT than other baselines. Finally, we assess our approach for segmentation problem and showcase its effectiveness with experiments.
translated by 谷歌翻译
Landing an unmanned aerial vehicle unmanned aerial vehicle (UAV) on top of an unmanned surface vehicle (USV) in harsh open waters is a challenging problem, owing to forces that can damage the UAV due to a severe roll and/or pitch angle of the USV during touchdown. To tackle this, we propose a novel model predictive control (MPC) approach enabling a UAV to land autonomously on a USV in these harsh conditions. The MPC employs a novel objective function and an online decomposition of the oscillatory motion of the vessel to predict, attempt, and accomplish the landing during near-zero tilt of the landing platform. The nonlinear prediction of the motion of the vessel is performed using visual data from an onboard camera. Therefore, the system does not require any communication with the USV or a control station. The proposed method was analyzed in numerous robotics simulations in harsh and extreme conditions and further validated in various real-world scenarios.
translated by 谷歌翻译
We develop theory and methods that use the graph Laplacian to analyze the geometry of the underlying manifold of point clouds. Our theory provides theoretical guarantees and explicit bounds on the functional form of the graph Laplacian, in the case when it acts on functions defined close to singularities of the underlying manifold. We also propose methods that can be used to estimate these geometric properties of the point cloud, which are based on the theoretical guarantees.
translated by 谷歌翻译
Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as "the Bar Exam," as a precondition for law practice. To even sit for the exam, most jurisdictions require that an applicant completes at least seven years of post-secondary education, including three years at an accredited law school. In addition, most test-takers also undergo weeks to months of further, exam-specific preparation. Despite this significant investment of time and capital, approximately one in five test-takers still score under the rate required to pass the exam on their first try. In the face of a complex task that requires such depth of knowledge, what, then, should we expect of the state of the art in "AI?" In this research, we document our experimental evaluation of the performance of OpenAI's `text-davinci-003` model, often-referred to as GPT-3.5, on the multistate multiple choice (MBE) section of the exam. While we find no benefit in fine-tuning over GPT-3.5's zero-shot performance at the scale of our training data, we do find that hyperparameter optimization and prompt engineering positively impacted GPT-3.5's zero-shot performance. For best prompt and parameters, GPT-3.5 achieves a headline correct rate of 50.3% on a complete NCBE MBE practice exam, significantly in excess of the 25% baseline guessing rate, and performs at a passing rate for both Evidence and Torts. GPT-3.5's ranking of responses is also highly-correlated with correctness; its top two and top three choices are correct 71% and 88% of the time, respectively, indicating very strong non-entailment performance. While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future.
translated by 谷歌翻译
The future of population-based breast cancer screening is likely personalized strategies based on clinically relevant risk models. Mammography-based risk models should remain robust to domain shifts caused by different populations and mammographic devices. Modern risk models do not ensure adaptation across vendor-domains and are often conflated to unintentionally rely on both precursors of cancer and systemic/global mammographic information associated with short- and long-term risk, respectively, which might limit performance. We developed a robust, cross-vendor model for long-term risk assessment. An augmentation-based domain adaption technique, based on flavorization of mammographic views, ensured generalization to an unseen vendor-domain. We trained on samples without diagnosed/potential malignant findings to learn systemic/global breast tissue features, called mammographic texture, indicative of future breast cancer. However, training so may cause erratic convergence. By excluding noise-inducing samples and designing a case-control dataset, a robust ensemble texture model was trained. This model was validated in two independent datasets. In 66,607 Danish women with flavorized Siemens views, the AUC was 0.71 and 0.65 for prediction of interval cancers within two years (ICs) and from two years after screening (LTCs), respectively. In a combination with established risk factors, the model's AUC increased to 0.68 for LTCs. In 25,706 Dutch women with Hologic-processed views, the AUCs were not different from the AUCs in Danish women with flavorized views. The results suggested that the model robustly estimated long-term risk while adapting to an unseen processed vendor-domain. The model identified 8.1% of Danish women accounting for 20.9% of ICs and 14.2% of LTCs.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译